9 research outputs found

    Heterogeneous Stochastic Interactions for Multiple Agents in a Multi-armed Bandit Problem

    Full text link
    We define and analyze a multi-agent multi-armed bandit problem in which decision-making agents can observe the choices and rewards of their neighbors. Neighbors are defined by a network graph with heterogeneous and stochastic interconnections. These interactions are determined by the sociability of each agent, which corresponds to the probability that the agent observes its neighbors. We design an algorithm for each agent to maximize its own expected cumulative reward and prove performance bounds that depend on the sociability of the agents and the network structure. We use the bounds to predict the rank ordering of agents according to their performance and verify the accuracy analytically and computationally

    Melting Pot 2.0

    Full text link
    Multi-agent artificial intelligence research promises a path to develop intelligent technologies that are more human-like and more human-compatible than those produced by "solipsistic" approaches, which do not consider interactions between agents. Melting Pot is a research tool developed to facilitate work on multi-agent artificial intelligence, and provides an evaluation protocol that measures generalization to novel social partners in a set of canonical test scenarios. Each scenario pairs a physical environment (a "substrate") with a reference set of co-players (a "background population"), to create a social situation with substantial interdependence between the individuals involved. For instance, some scenarios were inspired by institutional-economics-based accounts of natural resource management and public-good-provision dilemmas. Others were inspired by considerations from evolutionary biology, game theory, and artificial life. Melting Pot aims to cover a maximally diverse set of interdependencies and incentives. It includes the commonly-studied extreme cases of perfectly-competitive (zero-sum) motivations and perfectly-cooperative (shared-reward) motivations, but does not stop with them. As in real-life, a clear majority of scenarios in Melting Pot have mixed incentives. They are neither purely competitive nor purely cooperative and thus demand successful agents be able to navigate the resulting ambiguity. Here we describe Melting Pot 2.0, which revises and expands on Melting Pot. We also introduce support for scenarios with asymmetric roles, and explain how to integrate them into the evaluation protocol. This report also contains: (1) details of all substrates and scenarios; (2) a complete description of all baseline algorithms and results. Our intention is for it to serve as a reference for researchers using Melting Pot 2.0.Comment: 59 pages, 54 figures. arXiv admin note: text overlap with arXiv:2107.0685

    Learning Through Social Interactions and Learning to Socially Interact in Multi-Agent Learning

    No full text
    The rapid integration of AI agents into the society underscores the need for a deeper understanding of how these agents can benefit from social interactions and develop collective intelligence. Cultural evolution studies have emphasized the importance of cultural transmission of knowledge and intelligence across generations, highlighting that social interactions play a crucial role in a group's ability to solve complex problems or make optimal decisions. Humans are remarkable at learning through social interactions and we posses an innate ability to seamlessly perceive social interactions, acquire and transmit knowledge through social interactions, and transfer cognitive capabilities and knowledge through generations. A natural question is how can we embed these capabilities in AI agents? As a step towards answering this question this dissertation investigates two main research questions: (1) how AI agents can learn to effectively communicate with other agents, and (2) how AI agents can enhance their ability to generalize or adapt to novel partners/opponents through social interactions. The first section of this dissertation focuses on developing methodologies that facilitate effective communication among AI agents under various communication constraints. We specifically examine communication in sequential decision-making tasks within uncertain environments, where the primary challenge lies in balancing exploration and exploitation to achieve optimal performance. To tackle this challenge, we propose innovative methodologies that enable efficient communication and decision-making among agents, taking into account the intricacies of the problem domain, such as communication costs, different communication networks, and agent specific probabilistic communication constraints. Further, we investigate the role of agent heterogeneity in individual and group performance and develop methods that can leverage heterogeneity to improve performance. The second section delves into the topic of generalization in multi-agent AI. Our research investigates how agents can adapt their policies to collaborate with novel agents they have not previously encountered in tasks that necessitate coordination and cooperation among agents to achieve optimal outcomes. We introduce new techniques, that empower agents to learn and adapt their strategies to novel partners/opponents, fostering improved cooperation and coordination among AI agents. We investigate how heterogeneous social preferences of agents lead to behavioural diversity. Further, we investigate how learning a best response to diverse policies can lead to better generalization. In exploring these research areas, this dissertation aims to enrich our understanding of how AI agents can effectively collaborate in complex social scenarios, thereby contributing to the advancement of the Artificial Intelligence

    A Dynamic Observation Strategy for Multi-agent Multi-armed Bandit Problem

    No full text
    We define and analyze a multi-agent multi-armed bandit problem in which decision-making agents can observe the choices and rewards of their neighbors under a linear observation cost. Neighbors are defined by a network graph that encodes the inherent observation constraints of the system. We define a cost associated with observations such that at every instance an agent makes an observation it receives a constant observation regret. We design a sampling algorithm and an observation protocol for each agent to maximize its own expected cumulative reward through minimizing expected cumulative sampling regret and expected cumulative observation regret. For our proposed protocol, we prove that total cumulative regret is logarithmically bounded. We verify the accuracy of analytical bounds using numerical simulations
    corecore